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Recently low-resource devices such as radio frequency identification (RFID), 
internet of things (IoT), and wireless sensor networks (WSN) using 
lightweight cryptography (LWC) to protect devices. Created or design low- 
resource devices with a lightweight cryptographic technique should take into 
account important factors such as the battery life and the amount of data to be 
processed. This paper provides a new hardware designed for Loong 
lightweight cryptographic algorithm that takes into account the previously 
described constraints. The new hardware architecture for Loong algorithm 
with resource sharing to reduce system designed. The proposed approach is 
implemented using ISE Xilinx V14.7 using Virtex 4 field programmable gate 
array (FPGA) platform. The synthesis analysis for ISE showed the throughput 


Lightweight of 851.264 Mbps with efficiency of 2.282 Mbps/slice, and a power 
Loong consumption of 0.193 Watt. The implementation designed show the all- 
VHDL algorithms size consists of 373 slices, and the maximum possible operating 
frequency is 212.816 MHz. To the best of our knowledge, this is the first time 
that Loong algorithm has been implemented on FPGA using very high-speed 
integrated circuit hardware description language (VHDL). 
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1. INTRODUCTION 

Computer networks play an important role in our daily lives. Used by various applications such as the 
world wide web (WWW), electronic messaging or the transfer of computer files, they allow us to acquire 
information as well as to communicate and exchange data permanently. It is now possible to connect any object 
of our daily life to a network, and we speak of Internet of things to designate all of these connected objects. 
The internet of things (oT) has many application areas and thus offers immense potential for companies [1], 
industries and users. The lightweight cryptography algorithms used for develop many applications like 
electronic clinical record prototype that designed to work with low-performance devices. These electronic 
clinical record seek to secure the information without losing optimal performance and ensure low 
computational consumption [2]. 

The block cipher is one of important types of lightweight cryptography algorithm, such as 
RoadRunner [3], shadow [4], KLEIN [5], AES [6], GOST [7], LBlock [8], SFN [9], Midori [10], TWINE [11], 
and SPARX [12], which provide us with smaller block sizes than conventional lightweight cryptography 
(LWC), most key sizes ranging from 80 bits to 112-bit keys according to what was established in the National 
Institute Standards Technology (NIST), simpler rounds with an 8-bit S-box preference and programming 
simpler keys that generate sub-keys that increase of memory, latency, and power consumption [13]. Tools have 
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very limited resources, memory, power consumption, and processing speed capacities. Because of these 
limitations, traditional encryption cannot be used on devices with low storage space. As a direct result, the 
concept of "lightweight cryptography” was considered. As a result of the rapid development of new 
technologies gaining popularity, an entirely new form of encryption known as “lightweight” has been created. 
Due to the complexity of the computational operations required in traditional cryptography. The goal of 
lightweight cryptography is to reduce hardware-oriented and software-oriented implementation costs. 
Lightweight cryptography refers to an encryption technology designed for use in rapidly expanding 
applications that rely heavily on technology with limited resources [14]. 

In this paper, we analyze the security issues of connected objects, linked on the one hand to the large 
amount of data they handle, and on the other to the fact that they are often in a hostile environment and 
physically accessible. Then new hardware architecture implemented using field-programmable gate array 
(FPGA) platforms for Loong lightweight block cipher algorithms. The rest of the paper is organized as follows: 
section 2 we will explain the concept of lightweight block cipher algorithms, in section 3 we will review some 
of the LWC concepts, in section 4 we will present our model for LWC based on FPGA, and finally we will 
conclude our work in section 5. Recently many lightweight block cipher algorithms are implemented using 
FPGA board for IoT application. Many of lightweight block ciphers have been proposed e.g. PRINCE [15], 
LED [16], mCrypton [17], and PRESENT [18]. All of these ciphers are designed and aimed at specifically for 
extremely constrained environments such as radio frequency identification (RFID) tags and sensor networks. 
Abbas et al. [15] the authors design new FPGA IP-core which is to speed-up the performance of PRINCE. 
LED is a symmetric block cipher whose block size is 64 bits and its internal architecture is based on the 
substitution-permutation network (SPN). It is designed in two versions based on the key size; 64-bit key (LED- 
64) and 128-bit key (LED-128). Its number of rounds is based on the size of the encryption key; LED-64 has 
32 rounds while LED-128 has 48 rounds [16]. The mCrypton algorithm is a 64-bit lightweight block cipher 
cryptographic algorithm. Substitution permutation (SP) structure is used in design of mCrypton algorithm 
architecture [17]. A set of existing optimized lightweight cryptographic architectures are discussed here. Singh 
et al. [8] for LBLock is a 64-bit block cipher with an 80-bit key and 32 rounds. Mhaouch et al. [18] for the 
present the cipher is based on a substitution-permutation network (SPN). Present supports 64-bit input data 
blocks and key sizes of 80 and 128 bits. In [19] LILLIPUT cipher transforms 64-bits of plaintext into 64-bits 
of cipher text with 80-bits of the key. Anusha and Shastrimath [20] introduced XTEA which is includes 64-bit 
block size (Plain text) and 128-bit key size. Zeebaree [21] for DES, data are encrypted in 64-bit blocks using a 
56-bit key. So that the comparison would be as precise as possible, they used the same security level, 
technology, and hardware/software implementation complexity factors (chip area, throughput, latency, and 
power consumption) for each of the block ciphers under consideration [22], [23]. 


2. METHOD 
2.1. Loong algorithm 

SPN-based Loong includes a 128-bit key block addition to the 64-bit key. The values of 16, 20, and 
32 according to the round number (RN) system. The length of each of the three keys is what determines which 
of the three algorithms Loong-64, Loong-80, and Loong-128 is applied to the encryption process. Because 
Loong was the first algorithm to develop the round function technique, we are able to write "SubCells!" and 
"MixRows!" AddRoundKey. This symmetrical and lightweight block cipher has been given the name Loong. 
It gets its name from the fact that its round function uses two different SubCells algorithms. AddRoundKey, 
Sub-Cells, MixRows, and MixColumns are some of the sub-functions that are available to you when you use 
the round function [24]. A number of round features, including AddRoundKey, SubCells, MixRows, and 
MixColumns, are available in Figure 1. 

It is common knowledge that the AES technique utilizes the SPN structure as its foundation. In 
contrast to the Feistel network structure, the SPN network structure is able to produce round functions despite 
having a greater degree of confusion and diffusion than the latter. In comparison to other forms of computer 
programming, SPN-based algorithms offer superior levels of both productivity and dependability. In contrast, 
the process of encrypting and decrypting data is approached in a different manner by algorithms that are based 
on SPN [25]. In order to solve this problem, we propose that the SPN adopt a new organizational structure. In 
this newly designed SPN architecture, the operations of encrypting information with a cipher and decrypting it 
are exactly the same, Thus the Loong structure is highly efficient and secure. 

As a direct consequence of this, the SPN structure used by the Loong has been entirely rethought and 
revised. SubCells (SC), MixRows (MR), MixColumns (MC), and AddRoundKey (AR) are the four round 
transformations that make up Loong's round transformations (ARK) as show in Figure 2. To best characterize 
these constituents, the word "involutionary" is the one that works best [24]. The example that follows is an 
illustration of the round function used in Loong's encryption. 
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ENCRN[RC° ,..nRCRN |=ARK (RK,RC° ) 
o(O%, sc 0 MR 0 MC OSC 0 ARK (RK,RCRN)) [24] (1) 
r=1 ? 
As a result, the encryption and decryption of Loong are identical. Decryption uses round constants in 
the opposite order as encryption. We are required to give evidence that the encryption and decryption 
technologies are equivalent [24]. 


Plaintext Ciphertext 


Figure 1. Process function of Loong algorithm [24] 


FEES 


Figure 2. Diffusion effect in Loong [24] 


2.1.1. Encryption process using Loong algorithm 

SPN-based A 64-bit block in Loong is equivalent to a 64-bit round number, while an 80-bit key block 
is equivalent to a 64-bit round number, and a 128-bit key block is equivalent to a 128-bit round number (RN). 
The length of the three keys is used to determine which of the three Loong designations an algorithm is given: 
Loong-64, Loong-80, or Loong-128 [24]. The fundamental difference between Loong's encryption and 
decryption techniques is that the former uses round constants in the opposite sequence of the latter. This is the 
case while encrypting data. A number of round features, including AddRoundKey, SubCells, MixRows, and 
MixColumns, are available in Loong as shown in Figure 3. Provides an explanation of the intricate encryption 
process that Loong employed [24]. 

The method of enciphering data with the Loong cipher can also be referred to as ENCpn. Plaintext is 
what ENCpn takes in as its input, and the plaintext itself can be segmented into multiple 64-bit plaintext blocks 
in addition to a primary key if necessary. The encryption procedures for Loong-64, Loong-80, and Loong-128 
are each represented by ENC16, ENC20, and ENC32, respectively [24]. 


LO Oe a0 yPs 


ENCan lees key) — ciphertext 


[24] (2) 


In this particular instance, RN is equal to 16, 20, and 32, and the total value of K16, K20, and K32 is 
equal to 64. In Algorithm 1, the ENCpw is illustrated by making use of a 64-bit Round Key (RK), which is a 
topic that is covered in key scheduling. 


Algorithm 1. Loong routine 

ENCrn 

Input: Plaintext, RK, RC; 
Output: Ciphertext; 

1: state — Plaintext; 

: AddRoundKey (state, RK, RC); 
: for i=l to RN do 

: SubCells (state); 

: MixRows (state); 

: MixColumns (state) ; 

: SubCells (state); 

: AddRoundKey (state, RK, RC); 


AANA UP WN 
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9: endfor 
10: Ciphertext « state; 
11: Return Ciphertext; 


2.1.2. Decryption process 

This approach, which is the same process as the Loong cipher's inverse, is used to construct both the 
Loong cipher and its inverse. The process of encrypting and decrypting data follows the same pattern when it 
comes to the flow of the data. Long messages can be decrypted if the round constants are read backwards from 
the sequence in which they were written [24]. This method is known as "reverse reading." This Loong 
decryption is very quick and user-friendly as a direct result of this, thanks to the fact that it uses a direct 
consequence see Figure 4. 


Plaintext Ciphertext 


> Ciphertext 


Figure 3. Encryption Loong algorithm Figure 4. Decryption Loong algorithm 


2.2. FPGA implementation 

The field-programmable gate array, also known as an FPGA, is used to implement the lightweight 
block cipher encryption and hash function in hardware [25]. The hardware designed with different 
architectures. Also the architecture that is based on iterative looping has as its primary goal the reduction in 
the total number of hardware resources that are required for the design [26]. As a consequence of this, the 
design of loop unrolling no longer necessitates a number of round transformations equal to the algorithm's 
round count; rather, it only necessitates a single round transformation and the associated registers [27], [28]. 
This change was brought about as a result of the fact that loop unrolling no longer requires a number of round 
transformations equal to the algorithm's round count. Because of this, it is suitable for use in applications that 
are on a more limited scale [29]. The clock will be allowed to completely revolve around its axis once for each 
round of the competition. In order for this design to work as intended, the flow of operations needs to be 
controlled by a block of control logic as well [30], [31]. An FPGA is made up of a matrix of programmable 
logic units that may be reconfigured complex logic blocks (CLBs). The development of digital systems makes 
considerable use of FPGAs [18]. An FPGA is made up of a matrix of programmable logic units that may be 
reconfigured. Loop unrolling no longer requires a number of round transformations equal to the algorithm's 
round count. Because of this, it is suitable for use in applications that are on a more limited scale. The very 
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high-speed integrated circuit hardware description language (VHDL) language is used to implement the LWC 
Loong algorithm. Figure 5 shows the RTL top module for our proposal system. The proposal designed consist 
from four input ports and only one output port. The proposal designed base on FPGA Virtex 4 platforms. 


Loong_Enc 
Plaintext(63:0 Ciphertext(63:0) 
RK(63:0 
clk 
h y | 
Loong_Enc 


Figure 5. Top module of Loong RTL 


3. THE LOONG HARDWARE COMPONENT 

Loong is a lightweight block cipher algorithm that supports a block length of 64 bits with a key size 
of (64/80/128) bits. It consists of AddRoundKey, SubCells, MixRows, MixColumns, SubCells and 
AddRoundKey. The Loong algorithm decryption unit is about similar to the encryption design but the only 
variance between the encryption and the decryption is using round-constants in inverse order. 


3.1. Subcell 

The hardware implementation of 64-bit S-Box is presented in Figure 6. The input/ouput of S-Box 
component is 64-bit containe sixteen subcell (Sbox) with 4-bit input/output. The D Flip Flop are used to 
implemened the Subcell to reduce the hardware footprint. 


S_Box_64_input_64_ output 
S_box_input_64_bit(63:0) ‘N s_box_output_64_bit(63:0) 


N 4 
S_Box_64_input_64 output 


Figure 6. RTL of Subcell 


3.2. MixRo 
The hardware implementation of 64-bit MixRow is presented in Figure 7. The encryption process is 
used sixteen Multiplixers. Each Multiplixer used 4-bit input and 1-bit output. 


SPN_MixRows 


MixRows_SPN_State_In(63:0) MixRows_SPN_State_Out(63:0) 


IN A 
SPN_MixRows 


Figure 7. RTL of MixRow 
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3.3. MixColumn 
Figure 8 show RTL of 64-bit MixColumn. The encryption process is used sixteen Multiplixers. Each 
Multiplixer is used 4-bit input and 1-bit output. 


SPN_Mixcolumns 


SPN_State_In(63:0 N SPN_State_Out(63:0) 
| 


SPN_Mixcolumns 


Figure 8. RTL of MixColumn 


3.4. AddRoundkey 

The last component show in the Figure 9 described the implementation of Add round key that used 
two time every round. The input/output is 64-bit, first time the plaintext XOR with round key. Second time 
with every round the output from Mix column component is XOR with round key (RK) and the round constants 
(RC). 


AddRoundKey 


AddRoundKey_RC(63:0) AddRoundKey_output(63:0) 


AddRoundKey_RK(63:0) 
AddRoundKey_State(63:0) 
h» 4 
AddRoundKey 


Figure 9. RTL of AddRoundKey 


4. RESULTS AND DISCUSSION 

The main target of this paper is implement small hardware design for Loong algorithm with low of 
latency. The amount of data that is transferred during permutation designed using wire only as well as row-to- 
column conversion. The new architecture designed and implemented using ISE, then ISim simulation program 
was used to execute the Loong algorithm with test vectors. The plaintext and key size is 64-bit. Figure 10 
shows the outputs of the simulation that was run on the most current three system components that were 
constructed, as well as the results of the most recent data that was input into the design. A relatively low number 
of slices is required in order for the proposed architecture to function properly and successfully meet the goals 
of high frequency and high throughput. When working in shift mode, it is possible to reduce the total number 
of slices by making use of the lookup table (LUT) technique. The LUT is responsible for transferring data by 
shuffling the order of the input and output data places. The shift operation that makes use of LUTs can be 
carried out in a single cycle of the clock if necessary. As show in Figure 11 show the decryption process. 

When utilizing this design, it is necessary to utilize a total of sixteen clock cycles in order to process 
the inputs and produce the cipher text. The results of running the simulation on the remaining three parts of the 
system that is being envisioned along with the last round of data input. It has been proved that the proposed 
pipeline architecture is effective even with a constrained number of slices when it comes to the execution of 
tasks that need high frequency and high throughput. During shift operations, the number of slices may be cut 
down to a more manageable level thanks to the LUT technique. A LUT will shift both the input data values 
and the output data values whenever it is given the instruction to perform a shift operation. It is possible for 
the shift operation that makes use of LUTs to be finished in a single cycle of the clock. Reducing the number 
of slices used in the creation of a design for RFID or IoT applications. Area is defined as the total number of 
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slices. The results of the design architecture performance are displayed in Table 1. These results have a high 
throughput with small area and small energy. In addition, the implementation for proposal design show 
throughput 851.264 Mbps, efficiency of 2.28 Mbps per slice, and total power consumption of 0.193 mW. 
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Figure 10. Simulation of Loong encryption 
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Figure 11. Simulation of Loong decryption 


Table 1. Compares the encryption and decryption resources given by Loong to those provide 


; Block FPGA Max. Freq. Throughput Total oe Power 
Algorithm size Device (MHz) (Mbps) Slices Eificioney (mWatt) 
Proposed Loong 64 Virtex-4 212.816 851.264 373 2.282 0.193 
LBLock [8] 64 Virtex-4 315.00 635.021 158 4.02 0.878 
PRESENT [21] 64 Virtex-4 364.56 171.56 152 0.041 248.02 
LILLIPUT [22] 64 Virtex-4 654.24 465.24 331300 wane 285.00 
XTEA [23] 64 Artix-7 263.762 80.43 238 0.34 0.222 


5. CONCLUSION 

Efficient hardware architecture for the Loong lightweight encryption algorithm is designed and 
implemented in this paper. The area and power are optimized for our new architecture based on FPGAs. The 
utilization of a resource structure that is shared by using multiple component to process algorithms. All 
component is designed to execute through sixteen clock cycles. The results for implement Loong algorithm 
using Xilinx Virtex-4 FPGA show that number of slices is 373 with an operating frequency 212.816 MHz and 
a power consumption of 0.192 mWatt. In addition, a total throughput of 851.264 Mbps with a slice efficiency 
of 2.282 Mbps/slice. Finally the result of hardware proposal designed for LWC Loong algorithm shows that it 
is appropriate for mobile and small devices. 
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